Zeropadding And FFTConvolve

Chris Tralie

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from scipy.signal import fftconvolve
import librosa

The formula for convolution is as follows:

$x*h[n] = \sum_{j=0}^{J-1} h[j]x[n-j] $

But this is slow! We want to be able to use the convolution/multiplication property of the DFTs so that we can leverage the Fast Fourier Transform to perform the convolution for us. This is what fftconvolve does. The result is computed very quickly!

In [2]:
sr = 44100
x, sr = librosa.load("aud_jessiesgirl.wav", sr=sr)
ipd.Audio(x, rate=sr)
Out[2]:
In [3]:
sr = 44100
h, sr = librosa.load("imp_JFKTunnel.wav", sr=sr)
ipd.Audio(h, rate=sr)
Out[3]:
In [4]:
y = fftconvolve(x, h)
ipd.Audio(y, rate=sr)
Out[4]:

To use the DFT, we want to do the following:

  1. Perform the DFT of x and of h to get $X[k]$ and $H[k]$, respectively

  2. Compute $Y[k] = X[k]H[k]$ for all $k$

  3. Compute the inverse Fourier transform of $Y[k]$ to get the convolved signal back in the time domain

But there are a few problems with this plan. First, note that the length of the convolution of $x$ and $h$ is actually $len(x)+len(h)-1$.

In [5]:
N = len(x)
M = len(h)
print("N = ", N)
print("M = ", M)
print("len(y) = ", len(y), M+N-1)
N =  1323588
M =  74046
len(y) =  1397633 1397633

This means if we simply perform the inverse DFT of something with length $len(x)$, we won't have enough space to hold all of the echoes. Furthermore, when we derived the formula, we assumed that $x$ and $h$ were the same length, but this definitely is not true in general! What we need to do is zeropad both $x$ and $h$; that is, we create two new signals $xz$ and $hz$, each with $M+N-1$ samples. The first $len(x)$ samples of $xz$ are $x$, and the rest are zeros. Likewise, the first $len(h)$ samples of $hz$ are $h$, and the rest are zeros

In [6]:
xz = np.zeros(M+N-1)
xz[0:len(x)] = x
hz = np.zeros(M+N-1)
hz[0:len(h)] = h

We note that zeropadding doesn't do anything to the overall frequency pattern; it simply "interpolates" values in between. In other words, we end up with more frequency bins, but the new frequency bins are determined from the ones that are already there; no new information has been added

In [7]:
plt.subplot(211)
plt.plot(np.abs(np.fft.fft(h)))
plt.title("h")
plt.subplot(212)
plt.plot(np.abs(np.fft.fft(hz)))
plt.title("Zeropadded h")
plt.tight_layout()

Finally, since our zeropadded signals are of the same length that's long enough to hold the impulse response, we can finally implement our plan of performing multiplication in the frequency domain and going back to the time domain with an inverse fourier transform, and we get our convolution!

In [8]:
X = np.fft.fft(xz)
H = np.fft.fft(hz)
Y = X*H # Using the fact that convolution is multiplication in the frequency domain
y = np.fft.ifft(Y) # Perform the inverse DFT
y = np.real(y)
ipd.Audio(y, rate=sr)
Out[8]:
In [ ]: